-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update DB with TY2022 data #25
Conversation
* 2022 has different column name for levy_plus_loss "levy+loss" * update agency make sheet explicit, update across syntax, add 2022 column names * Update cpihistory.pdf * Switching to pdftools Pretty sure this is the same but didn't want to use noncran tabulizer * from press release * remove tabulizer * add 2022 * add excel conversions * update 2006 to 2012 to excel versions * add 2022 tax code * sample 2022 bills * update with pdftools * lint /style
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #25 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 4 4
Lines 439 439
=========================================
Hits 439 439 ☔ View full report in Codecov by Sentry. |
Give priority to certain names on bills in the detail output and add names for TY2022
data-raw/tif/tif.R
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@erhla I had to revert your xlsx TIF changes since there were a bunch of errors in the conversions. The xlsx files were missing ~100 rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah oops. I'll look into that.
@@ -193,7 +193,7 @@ tax_bill <- function(year_vec, | |||
|
|||
# Calculate the exemption effect by subtracting the exempt amount from | |||
# the total taxable EAV | |||
dt[, agency_tax_rate := agency_total_ext / agency_total_eav] | |||
dt[, agency_tax_rate := agency_total_ext / as.numeric(agency_total_eav)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix for a weird integer overflow using the int64 type and 0 values. Coercing to numeric solves it fine 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've encountered this before with ptaxsim
it is an annoying int64 quirk.
@erhla If you have the time can you give this a quick skim? Else I'll merge it by EOD today (1/19). |
Confirm this looks good. The int64 overflow was an issue with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
This PR adds Tax Year 2022 data to the PTAXSIM database. It also slightly refactors the raw data processing for simplicity. Thanks @erhla for all the help on this PR. Closes #21.
Database Release Checklist
.gitattributes
such that the raw data files are tracked by git LFSdata-raw/
) that prepare and clean the data. These scripts will save the cleaned data to a staging area in S3. Ensure that the relevant S3 keys in the PTAXSIM bucket are updated using the AWS console or APIdata-raw/create_db.R
, increment thedb_version
variable following the schema outlined aboverequires_pkg_version
variable indata-raw/create_db.R
DESCRIPTION
file:Config/Requires_DB_Version
: This is the minimum database version required for this version of the package. It should be incremented whenever there is a breaking changeConfig/Wants_DB_Version
: This is the maximum database version required for this version of the package. It is the version of the database pulled from S3 during CI/testing on GitHubdata-raw/create_db.sql
. These statements define the structure of the databasedata-raw/create_db.R
. This will create the SQLite database file by pulling data from S3. The file will be generated in a temporary directory (usually/tmp/Rtmp...
), then compressed usingpbzip2
(required for this script)db_path
after runningdata-raw/create_db.R
) and move it to the project directory. Rename the fileptaxsim-<TAX_YEAR>.<MAJOR VERSION>.<MINOR VERSION>.db.bz2
pbzip2
. The typical command will be something likepbzip2 -d -k ptaxsim-2021.0.2.db.bz2
ptaxsim.db
for local testing. This is the file name that the unit tests and vignettes expectdevtools::test()
in the console) and vignettes (pkgdown::build_site()
in the console) locallyREADME.Rmd
file to update the database link at the top of the README. The link is pulled from theptaxsim.db
file'smetadata
tableaws s3 mv ptaxsim-2021.0.2.db.bz2 s3://ccao-data-public-us-east-1/ptaxsim/ptaxsim-2021.0.2.db.bz2